Coping or Causing? The Global/Democratic Relationship Between Substance Use and Mental Illness#

  • Group: E2

  • Date: 27-06-2025

  • Course: Information Visualisation

  • Authors:

    • Tide Benevento (15239349)

    • Sophie Gierstberg (15551490)

    • Sarah Out (15638472)

    • Palina Vasilyeva (15675548)

Introduction#

The connection between substance use and mental health has become increasingly relevant. More and more people turn to alcohol or drugs, not just for enjoyment or social reasons, but often to deal with deeper struggles. Whether it’s stress, anxiety, or depression, substance use is sometimes seen as a quick escape or temporary relief. But while these coping mechanisms may offer short-term comfort, the long-term effects are often more complex. In this data story, we explore whether there is a meaningful relationship between substance use and the prevalence of mental illness across countries between 2000 and 2017.

The first perspective suggests that higher rates of mental illness lead to increased drug use. Our first argument says that there is a strong positive correlation between drug use and anxiety levels. Our second argument states that there is a positive link between drug use and depression.

The second perspective argues that mental illness is a driving factor behind higher alcohol consumption. Our first argument is that countries with higher overall alcohol consumption also report higher anxiety levels. The second argument focuses on depression, stating that higher alcohol consumption also relates to higher depression rates.

A third viewpoint focuses on the role of democracy, saying that continents with stronger democratic values, show higher alcohol consumption and drug use rates. For this perspective, we used the average democracy index per continent to keep the visualizations clear and interpretable. Countries with stronger democratic values often show higher alcohol consumption, as alcohol is more accessible, socially accepted and integrated into daily life (Inman et al., 2017). These freedoms may not directly cause more mental illness but can shape substance use patterns. Our first argument states that there is a positive relationship between a higher democracy index and higher alcohol consumption rates. Our second argument says that countries with higher democratic values also tend to show higher levels of drug use.

In this datastory, we will find out how these different variables interact, and whether they reveal deeper patterns between substance use, mental health and the societies we live in.

Dataset and Preprocessing#

The first step that needs to be taken is finding a dataset or several datasets that provide the necessary information. For this project, this meant that it was necessary to find datasets that contained different types of alcohol consumption per country and per year, as well as a dataset that contained the mental health disorders per country. The last dataset we needed was a dataset containing the democratic index per continent. For completeness we chose to use the data from the period 2000 to 2017.

Alcohol Consumption Dataset#

We used data from Our World in Data, which has global statistics on alcohol consumption datasets (Ritchie & Roser, 2022). The original datasets can be found on and downloaded directly from the site https://ourworldindata.org/alcohol-consumption.

The following datasets were used:

  • Total alcohol consumption per capita

This dataset contains the average alcohol consumption per adult (15+ years old), measured in litres of pure alcohol, per year, per country.

  • Beer consumption per capita

This dataset provides the average beer consumption per adult, per year, measured in litres of pure alcohol, per country.

  • Wine consumption per capita

This dataset shows the average wine consumption per adult, per year, measured in litres of pure alcohol, per country.

  • Spirit consumption per capita

Each dataset contained the variables country and year, along with either the specific type of alcohol consumption or the total consumption.

Mental Health Dataset#

The other dataset that needed to be found, was a dataset about mental illnesses per country. The most recent and reliable dataset that we could find can be downloaded from https://www.kaggle.com/datasets/thedevastator/uncover-global-trends-in-mental-health-disorder. The data in this dataset ranges from 1991 to 2019. However, the data we needed were missing from the years 2018 and 2019, making 2017 the most recent year with correct values per country. This dataset contains the following variables:

  • Entity

  • Country code

  • Year

  • Schizophrenia (%)

  • Bipolar disorder (%)

  • Eating disorders (%)

  • Anxiety disorders (%)

  • Drug use disorders (%)

  • Depression (%)

  • Alcohol use disorders (%)

From the dataset, we focused on Depression (%), Anxiety disorders (%) and Drug use disorders (%) (along with Entity and Year), excluding the other variables because of their strong genetic basis or them not being suitable for this analysis.

Democratic dataset#

We used data from https://www.idea.int/democracytracker/gsod-indices/, provided by the International Institute for Democracy and Electoral Assistance (IDEA). This dataset offers an overview of the democratic development of countries worldwide, across different dimensions of democracy. We selected the different continents and the four aspects of democracy over the time period of 2000 to 2017, resulting in a dataset with the following variables:

  • Year

  • Country / Region

  • Participation

  • Rule of Law

  • Rights

  • Representation

We calculated the average score of the four dimensions, using Excel’s AVERAGE function (=AVERAGE(range)), resulting in a dataset with a new variable Average_Democratic_Index which replaced the original four separate indicators. The website provided the Americas as a single combined region, so we had to treat them as one and later assigned the same democracy index to both South and North America.

Merging#

All the aggregated and merged datasets were merged into the final ‘clean’ dataset and this is the dataset used for the data analysis. Because the alcohol datasets contained more countries than the mental health dataset, we chose to retain only the overlapping country - year combinations. This meant we only kept the rows for which we had data in all six datasets. After merging, we manually verified the data and made corrections where necessary.

Pseudocode#

To merge the datasets, we used python code with the pandas package. We also had to import the pycountry and pycountry_convert libraries, for the continents.

  1. Load the first five datasets (beer, wine, spirits, total alcohol and mental health).

  2. Strip column names and rename last columns to standardized names.

  3. Keep only Entity, Year and consumption values.

  4. Clean mental health data: drop unused columns, convert Year, filter 2000-2017, rename columns.

  5. Convert and filter Year in alcohol datasets (2000-2017).

  6. Merge all datasets on [Entity, Year].

  7. Keep only countries with complete data for all years.

  8. Assign continents to countries using pycountry and overrides.

  9. Load democracy index data and rename columns.

  10. Map regions to standard continents.

  11. Duplicate Americas entries as South America.

  12. Combine, deduplicate and clean democracy scores.

  13. Merge democracy index into main dataframe on [Continent, Year].

  14. Export final dataframe to CSV.

Final Dataset#

The final clean dataset, with the name ‘Dataset_v1’, contains the following variables:

  • Country

  • Year

  • Beer_consumption

  • Wine_consumption

  • Spirits_consumption

  • Total_alcohol_consumption

  • Depression_rate

  • Anxiety_rate

  • Drug_use_rate

  • Continent

  • Avg_democratic_index

Visualisations#

Imports and data collection#

import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go

from scipy.stats import linregress

import seaborn as sns
try:
    from google.colab import files
    uploaded = files.upload()
    filename = next(iter(uploaded))
except ModuleNotFoundError:
    # fallback for non-Colab environment
    filename = 'Dataset_v1.xlsl'

Perspective 1 - Higher mental illness rates lead to higher drug use rates.#

Argument 1 - There is a strong positive correlation between drug use and anxiety.#

The Heatmap below shows shades of one colour to demonstrate the relation between Anxiety, Depression, and Drug Use. Each square block represents the correlation between each pair of the variables.

As shown with the medium blue colour in the correlation heatmap below, there is a positive correlation between drug use and anxiety rates across countries. The Pearson correlation coefficient (r) between the variables Drug_use_rate and Anxiety_rate is r=0.58, which indicates a moderately strong positive correlation (Turney, 2023).

This suggests that in countries where drug use is high, anxiety rates are also higher. While this does not imply causation, the consistent positive association across countries shows that there is a link between these two variables.

df = pd.read_excel("Dataset_v1.xlsx")

columns = ['Drug_use_rate', 'Depression_rate', 'Anxiety_rate']
df_selected = df[columns]

corr_matrix = df_selected.corr().round(2)

fig = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.columns,
    colorscale='RdBu',
    zmid=0,
    colorbar=dict(title='Correlation'),
    hovertemplate='Correlation between %{x} and %{y}: %{z:.2f}<extra></extra>'
))

fig.update_layout(
    title='Correlation Heatmap',
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=100, l=50, r=50),
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    xaxis=dict(side='top'),
    annotations=[
        dict(
            text="Interactive correlation heatmap visualizing the relationships between drug use, depression and anxiety rates. <br>Hover to explore the strength and direction of correlations.",
            xref="paper", yref="paper",
            x=-0.16, y=-0.21,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

Argument 2 - More drug use is linked to higher depression rates.#

The bar graph shows that the relationship between drug use and depression is negative in every continent, with Asia having a Pearson’s correlation coefficient of r=-0.97 showing the strongest negative correlation. This contrasts with the weakly positive global correlation (r=0.27) from the heatmap in argument 1. This is an example of the Simpson’s paradox: when data is grouped, overall trends can reverse (Alin, 2010). Large differences in average drug use and depression rates between continents cause this shift.

For example, continents like Oceania have both higher average drug use rates and higher depression rates, while continents like Africa have much lower averages in both. When these group-level differences are combined into a single dataset, they can create the illusion of a positive relationship, even though the trend within each group is negative.

The paradox highlights how statistical relationships depend on context and why it’s essential to consider group structure before interpreting summary statistics.

df_clean = df.dropna(subset=['Continent', 'Year', 'Drug_use_rate', 'Depression_rate'])

df_yearly = df_clean.groupby(['Continent', 'Year'], as_index=False).agg({
    'Drug_use_rate': 'mean',
    'Depression_rate': 'mean'
})

correlations = []
for continent, group in df_yearly.groupby('Continent'):
    if len(group) >= 3:
        corr = group['Drug_use_rate'].corr(group['Depression_rate'])
        if pd.notna(corr):
            correlations.append({'Continent': continent, 'Correlation': corr})

df_corr = pd.DataFrame(correlations).sort_values(by='Correlation')

fig = px.bar(
    df_corr,
    x='Correlation',
    y='Continent',
    orientation='h',
    color='Continent',
    color_discrete_sequence=px.colors.qualitative.Bold,
    title='Correlation Between Drug Use and Depression Rate per Continent<br><sub>(Based on Yearly Averages)</sub>'
)

fig.update_layout(
    xaxis_title='Pearson Correlation',
    yaxis_title='Continent',
    xaxis=dict(showgrid=True, zeroline=True, zerolinecolor='gray'),
    showlegend=False,
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=130, l=50, r=50),
    annotations=[
        dict(
            text="Each bar shows the correlation between drug use and depression rates per continent (based on yearly averages). <br>Hover over the bars to explore exact values and compare continent-level patterns interactively.",
            xref="paper", yref="paper",
            x=-0.16, y=-0.30,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

Perspective 2 - Higher mental illness rates cause higher alcohol consumption.#

Argument 1 - A high alcohol consumption rate has a positive relationship with a higher anxiety rate.#

The data provides mixed evidence regarding the proposed positive relationship between alcohol consumption and anxiety rates.

In the choropleth map, where it displays the alcohol consumption per country, darker shades represent higher alcohol consumption levels. Mostly Russia and countries across Europe, North America and Oceania show relatively high alcohol intake. This pattern is fairly stable over the years.

In the choropleth map displaying the anxiety rates per country, we see that some of these same regions, such as North America and parts of Western Europe, also report higher anxiety levels. However, this pattern is not universally consistent. For instance, some countries with low alcohol consumption in Africa and Asia also report moderate to high anxiety rates, and vice versa. The time slider reveals no strong or consistent trend linking changes in alcohol consumption with anxiety levels over time.

df = pd.read_excel("Dataset_v1.xlsx")

fig_alcohol = px.choropleth(
    df,
    locations='Country',
    locationmode='country names',
    color='Total_alcohol_consumption',
    color_continuous_scale='Oranges',
    animation_frame='Year',
    range_color=(0, 20),
    title='Alcohol Consumption per Country',
    projection='natural earth'
)

fig_alcohol.update_layout(
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=100, l=50, r=50),
    coloraxis_colorbar=dict(len=0.75, title='Alcohol (L/year)'),
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    geo=dict(showframe=False, bgcolor='linen'),
    annotations=[
        dict(
            text="Slide the bar below to explore how total alcohol consumption per country changes over time. <br>Take the world for a spin and explore new countries!",
            xref="paper", yref="paper",
            x=-0.03, y=-0.15,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig_anxiety = px.choropleth(
    df,
    locations='Country',
    locationmode='country names',
    color='Anxiety_rate',
    color_continuous_scale='Blues',
    animation_frame='Year',
    range_color=(0, 12),
    title='Anxiety Rate per Country',
    projection='natural earth'
)

fig_anxiety.update_layout(
    coloraxis_colorbar=dict(len=0.75, title='Anxiety (%/pop)'),
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=100, l=50, r=50),
    geo=dict(showframe=False, bgcolor='linen'),
    annotations=[
        dict(
            text="Slide the bar below to explore how anxiety rates per population vary across countries over the years. <br>Take the world for a spin and explore new countries!",
            xref="paper", yref="paper",
            x=-0.03, y=-0.15,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig_alcohol.show()
fig_anxiety.show()

The line chart of regression lines per alcohol type against anxiety rates provides the clearest quantitative insight. It shows the wine and beer consumption display has a noticeable positive linear trend with the anxiety rates: as consumption per capita increases, so does the anxiety rate. Wine in particular has the steepest slope and the largest increase. Spirits consumption, however, appears to have a much weaker or nearly flat correlation with anxiety.

Taken together, the visualizations suggest that while there may be a positive association between certain types of alcohol and anxiety rates, this is not a universal pattern across all regions or all alcohol types. The geographic variation and time trends indicate that the relationship is complex and not conclusively supported by the visual data alone.

df = pd.read_excel("Dataset_v1.xlsx")
latest_df = df[df['Year'] == df['Year'].max()]

alcohol_long = latest_df.melt(
    id_vars=['Country', 'Anxiety_rate'],
    value_vars=['Beer_consumption', 'Wine_consumption', 'Spirits_consumption'],
    var_name='Alcohol_Type',
    value_name='Consumption'
)

color_map = {
    'Beer_consumption': px.colors.qualitative.Bold[0],
    'Wine_consumption': px.colors.qualitative.Bold[1],
    'Spirits_consumption': px.colors.qualitative.Bold[2]
}

fig = go.Figure()

for alcohol_type, color in color_map.items():
    sub_df = alcohol_long[alcohol_long['Alcohol_Type'] == alcohol_type]
    x = sub_df['Consumption']
    y = sub_df['Anxiety_rate']

    slope, intercept, r, p, stderr = linregress(x, y)
    x_vals = np.linspace(x.min(), x.max(), 100)
    y_vals = intercept + slope * x_vals

    fig.add_trace(go.Scatter(
        x=x_vals,
        y=y_vals,
        mode='lines',
        name=alcohol_type.replace('_consumption', '').capitalize(),
        line=dict(color=color, width=3)
    ))

fig.update_layout(
    title="Regression Lines per Alcohol Type vs Anxiety Rate",
    xaxis_title="Consumption (liters per capita)",
    yaxis_title="Anxiety Rate (%)",
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=135, l=50, r=50),
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    annotations=[
        dict(
            text="This plot displays regression lines showing how different types of alcohol consumption relate to anxiety rates. <br>The steeper the line, the stronger the association.",
            xref="paper", yref="paper",
            x=-0.04, y=-0.32,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

Argument 2 - Higher alcohol consumption has a positive relationship with higher depression rates.#

The boxplot shows that countries with higher alcohol consumption levels tend to have higher depression rates on average. The median depression rate increases from the Low to the High alcohol group, suggesting a slightly positive association between alcohol use and depression.

df = pd.read_excel("Dataset_v1.xlsx")
latest = df[df['Year'] == df['Year'].max()].copy()

quantiles = latest['Total_alcohol_consumption'].quantile([0, 0.33, 0.66, 1.0])
bins = [quantiles[0.0], quantiles[0.33], quantiles[0.66], quantiles[1.0]]
labels = ['Low', 'Middle', 'High']
latest['Alcohol_Level'] = pd.cut(latest['Total_alcohol_consumption'], bins=bins, labels=labels, include_lowest=True)

fig = px.box(
    latest,
    x='Alcohol_Level',
    y='Depression_rate',
    color='Alcohol_Level',
    category_orders={'Alcohol_Level': ['Low', 'Middle', 'High']},
    title='Depression Rate per Alcohol Consumption Category',
    labels={'Depression_rate': 'Depression Rate (%)', 'Alcohol_Level': 'Alcohol Consumption Level'},
    color_discrete_sequence=px.colors.qualitative.Bold
)

fig.update_layout(
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    width=790,
    height=750,
    yaxis_range=[0, 7],
    title_x=0.5,
    margin=dict(t=100, b=135, l=50, r=50),
    annotations=[
        dict(
            text="This boxplot compares depression rates across different alcohol consumption levels. Hover to view <br>statistical details and observe how higher consumption levels may relate to increased depression variability. <br>Click the legend items to toggle the variables.",
            xref="paper", yref="paper",
            x=-0.05, y=-0.24,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

Perspective 3 - A higher democratic index implies higher alcohol consumption and drug use rates.#

Argument 1 - There is a positive relationship between a higher democratic index and higher alcohol consumption rates.#

The bubble and box graph suggest a possible positive relationship between democratic index and alcohol consumption, though this relationship is not consistent across all continents. In the bubble chart, each continent is a bubble sized by average democratic index and positioned by total alcohol consumption. Over 18-years, larger bubbles like Europe and North America tend to appear higher, proposing a higher democratic index that may link to higher alcohol consumption. However, the limited vertical movement and moderate values for regions like South America and Oceania weaken this trend.

df_clean = pd.read_excel("Dataset_v1.xlsx")

df_clean = df_clean.dropna(subset=['Continent', 'Total_alcohol_consumption', 'Avg_democratic_index'])

df_clean = df_clean[df_clean['Avg_democratic_index'].apply(lambda x: isinstance(x, (int, float)) and not pd.isna(x))]

df_agg = df_clean.groupby(['Continent', 'Year']).agg({
    'Total_alcohol_consumption': 'mean',
    'Avg_democratic_index': 'mean'
}).reset_index()


fig = px.scatter(
    df_agg,
    x='Continent',
    y='Total_alcohol_consumption',
    size='Avg_democratic_index',
    color='Continent',
    animation_frame='Year',
    animation_group='Continent',
    title='Alcohol Consumption by Continent (Bubble: Average Democratic Index)',
    labels={
        'Total_alcohol_consumption': 'Total Alcohol Consumption (Liters)',
        'Continent': 'Continent',
        'Avg_democratic_index': 'Average Score'
    },
    range_y=[0, df_agg['Total_alcohol_consumption'].max() + 1],
    size_max=40,
    color_discrete_sequence=px.colors.qualitative.Bold
)

fig.update_layout(
    sliders=[dict(
        y=-0.13
    )],
    updatemenus=[dict(
        type='buttons',
        showactive=False,
        y=-0.13,
        x=-0.05,
        xanchor='left',
        yanchor='top'
    )],
    width=790,
    height=650,
    xaxis_title="Continent",
    yaxis_title="Total Alcohol Consumption (Liters)",
    showlegend=True,
    title_x=0.5,
    margin=dict(t=100, b=100, l=50, r=50),
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    annotations=[
        dict(
            text="Slide the bar below to explore how alcohol consumption and democratic index scores vary by continent over time. <br>Click legend items to filter continents.",
            xref="paper", yref="paper",
            x=-0.05, y=-0.28,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

The bar chart shows Europe leading in both democratic index and the average consumption of wine, beer, and spirits. North and South America also show moderate to high alcohol consumption with a higher democratic index, while Asia, with a lower democratic index, shows the lowest average alcohol consumption. Overall, the data proposes a potential link between democracy and alcohol consumption, particularly in Europe. However, exceptions and limited variation in the bubble chart imply that the relationship is not conclusive across all regions.

df = pd.read_excel("Dataset_v1.xlsx")
df_clean = df.dropna(subset=['Continent', 'Total_alcohol_consumption', 'Avg_democratic_index'])
df_clean = df_clean[df_clean['Avg_democratic_index'].apply(lambda x: isinstance(x, (int, float)) and not pd.isna(x))]

df_grouped = df_clean.groupby('Continent').agg({
    'Spirits_consumption': 'mean',
    'Wine_consumption': 'mean',
    'Beer_consumption': 'mean',
    'Avg_democratic_index': 'mean',
}).reset_index()

continent_list = df_grouped['Continent'].tolist()
bold_colors = px.colors.qualitative.Bold

fig = go.Figure()

fig.add_trace(go.Bar(
    x=list(range(len(continent_list))),
    y=df_grouped['Beer_consumption'],
    name='Beer',
    offsetgroup='1',
    marker_color=bold_colors[0],
    yaxis='y'
))

fig.add_trace(go.Bar(
    x=list(range(len(continent_list))),
    y=df_grouped['Wine_consumption'],
    name='Wine',
    offsetgroup='2',
    marker_color=bold_colors[1],
    yaxis='y'
))
fig.add_trace(go.Bar(
    x=list(range(len(continent_list))),
    y=df_grouped['Spirits_consumption'],
    name='Spirits',
    offsetgroup='3',
    marker_color=bold_colors[2],
    yaxis='y'
))


for i, row in df_grouped.iterrows():
    democracy_val = row['Avg_democratic_index']
    fig.add_trace(go.Scatter(
        x=[i - 0.4, i + 0.4],
        y=[democracy_val, democracy_val],
        mode='lines',
        line=dict(color=bold_colors[3], width=6),
        name='Democracy Index (0–1)',
        yaxis='y2',
        showlegend=(i == 0)
    ))


fig.update_layout(
    title='Alcohol Consumption with Democracy Index per Continent',
    xaxis=dict(
        title='Continent',
        tickmode='array',
        tickvals=list(range(len(continent_list))),
        ticktext=continent_list,
        showgrid=False
    ),
    yaxis=dict(
        title='Alcohol Consumption (Litres)',
        titlefont=dict(color='black'),
        tickfont=dict(color='black'),
        range=[0, 6],
        showgrid=False

    ),
    yaxis2=dict(
        title='Democracy Index (0–1)',
        titlefont=dict(color=bold_colors[3]),
        tickfont=dict(color=bold_colors[3]),
        overlaying='y',
        side='right',
        range=[0, 1],
        showgrid=False

    ),
    legend=dict(
        x=1.15,
        y=1,
        xanchor='left',
        yanchor='top',
        orientation='v'
    ),
    barmode='group',
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=140, l=50, r=50),
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    annotations=[
        dict(
            text="This chart displays average alcohol consumption per continent by alcohol type, alongside the average democratic <br>index shown as the yellow lines. The two y-axes allow simultaneous comparison of liters consumed (left) <br>and democratic scores (right).",
            xref="paper", yref="paper",
            x=-0.05, y=-0.36,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

Argument 2 - A higher democratic index has a positive relationship with higher drug use.#

The violin plot shows the average drug use rate between the years of 2000 and 2017 by continent, categorized by democracy index levels. Wider violins indicate more variation in the drug use rate. The most stable continent considering the average drug use rate is Oceania, this can be seen by the bar in the middle of the Oceania violin, this shows where the majority of the data points lie. Europe shows the highest democratic index value and also shows a higher spread in drug use rates, suggesting greater variability. Overall, there seems to be no strong linear relation between the average drug use rates and the democratic index.

df = pd.read_excel("Dataset_v1.xlsx")
df_clean = df.dropna(subset=['Continent', 'Avg_democratic_index', 'Drug_use_rate', 'Year'])
df_filtered = df_clean[(df_clean['Year'] >= 2000) & (df_clean['Year'] <= 2017)]

bins = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
labels = ['0 - 0.2', '0.21 - 0.4', '0.41 - 0.6', '0.61 - 0.8', '0.8 - 1']
df_filtered['Democracy_Level'] = pd.cut(df_filtered['Avg_democratic_index'], bins=bins, labels=labels)

fig = px.violin(
    df_filtered,
    x='Continent',
    y='Drug_use_rate',
    color='Democracy_Level',
    box=True,
    points=False,
    category_orders={'Democracy_Level': labels},
    color_discrete_sequence=px.colors.qualitative.Bold[:3],
    width=1000,
    height=600
)

fig.update_traces(width=0.9, line=dict(width=1))

fig.update_layout(
    title='Drug Use Rate per Continent (2000–2017)<br>Colored by Democratic Index Level',
    xaxis_title='Continent',
    yaxis_title='Drug Use Rate (%)',
    legend_title='Democratic Index Level',
    paper_bgcolor='linen',
    plot_bgcolor='linen',
    width=790,
    height=600,
    title_x=0.5,
    margin=dict(t=100, b=160, l=50, r=50),
    annotations=[
        dict(
            text="This violin plot shows the distribution of drug use rates per continent between 2000 and 2017, <br>grouped by democratic index level. Hover over the violins to explore value ranges and medians.",
            xref="paper", yref="paper",
            x=-0.05, y=-0.43,
            showarrow=False,
            font=dict(size=12, color="black"),
            align="left"
        )
    ]
)

fig.show()

Summary#

Our visualizations revealed complex, context-dependent relationships between substance use, mental health and democracy from 2000 to 2017. Globally, higher drug use was linked to increased anxiety, while the association with depression varied: weakly positive overall but strongly negative in regions like Asia and Africa. Alcohol consumption showed mixed results. While some high-consumption countries (especially in Europe and North America) reported more anxiety, this wasn’t consistent worldwide. Wine and beer were more strongly associated with anxiety than spirits. A boxplot showed that countries with higher alcohol consumption also had higher median depression rates, suggesting a moderate positive relationship. Finally, countries with a higher democratic index generally consumed more alcohol, though this was not universal. No clear global pattern was found between democracy and drug use. Importantly, these are correlations, not causal relationships.

Reflection#

After submitting our data story draft, we had a feedback session with our TA and first-year students on Thursday, June 19th. One group member took notes during our presentation. We received positive feedback on our progress. The TA and students noted that our perspectives were clear, and most graphs were well aligned with them.

Suggestions for improvement included:

  • Fixing graph aesthetics, such as ensuring axes start at zero and values are fully visible.

  • Replacing the drop-down menu in Graph 3 with side-by-side world maps for easier comparison.

  • Combine the last two graphs into one graph containing the democratic index, alcohol consumption per continent between 2000 and 2017.

We agreed as a team to implement the changes stated above and worked on this in the week after receiving the feedback.

Self-Reflection#

Looking back on the entire project, it would have been more productive to have a standing appointment each week to work on the project together and be able to discuss the progress. Due to conflicting schedules, this was not possible with the entire group at the same time, but we made the best of the situation and kept in touch through texting.

Possibly, if given a longer period to finish the project we could have taken longer to find a more current dataset, as the datasets we have chosen to use, take data entries from the year 2000 up until 2017. These were the most recent datasets that we could find from one of the reliable sources given in the project description for the data story.

Work Distribution#

We primarily communicated through group calls and a shared chat. After selecting suitable datasets, we formulated two perspectives, later adding a third to possibly help explain patterns found in the first two perspectives. The final cleaned and merged dataset was used to create visualizations and gives insight into the relations between the chosen variables.

Sophie

Sophie found and cleaned the datasets with Sarah, merged the datasets, wrote the introduction and worked on several arguments and graphs. She also did the proofreading and fixed the mistakes in the grammar and sentence structure.

Sarah

Sarah cleaned the datasets with Sophie, created various graphs, documented the feedback from the TA meeting, and wrote the reflection, self-reflection, and work distribution.

Tide

Tide created graphs and wrote arguments for the perspectives and ensured the design was colorblind-friendly.

Palina

Palina implemented feedback on the perspectives by improving the graphs. She also explored how to upload the project to GitHub.

References#